Overview

Dataset statistics

Number of variables13
Number of observations523890
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory41.5 MiB
Average record size in memory83.0 B

Variable types

Numeric9
Boolean3
Categorical1

Warnings

pos_r is highly correlated with pos_tHigh correlation
pos_phi is highly correlated with mom_phiHigh correlation
pos_t is highly correlated with pos_rHigh correlation
mom_p is highly correlated with isHiggs and 2 other fieldsHigh correlation
mom_phi is highly correlated with pos_phiHigh correlation
isHiggs is highly correlated with mom_p and 2 other fieldsHigh correlation
isZ is highly correlated with mom_p and 2 other fieldsHigh correlation
label is highly correlated with mom_p and 2 other fieldsHigh correlation
pos_r is highly correlated with pos_theta and 1 other fieldsHigh correlation
pos_theta is highly correlated with pos_r and 1 other fieldsHigh correlation
pos_phi is highly correlated with mom_phiHigh correlation
pos_t is highly correlated with pos_r and 1 other fieldsHigh correlation
mom_phi is highly correlated with pos_phiHigh correlation
isHiggs is highly correlated with isZ and 1 other fieldsHigh correlation
isZ is highly correlated with isHiggs and 1 other fieldsHigh correlation
label is highly correlated with isHiggs and 1 other fieldsHigh correlation
pos_r is highly correlated with pos_theta and 1 other fieldsHigh correlation
pos_theta is highly correlated with pos_r and 1 other fieldsHigh correlation
pos_phi is highly correlated with mom_phiHigh correlation
pos_t is highly correlated with pos_r and 1 other fieldsHigh correlation
mom_phi is highly correlated with pos_phiHigh correlation
isHiggs is highly correlated with isZ and 1 other fieldsHigh correlation
isZ is highly correlated with isHiggs and 1 other fieldsHigh correlation
label is highly correlated with isHiggs and 1 other fieldsHigh correlation
isOther is highly correlated with labelHigh correlation
mom_mass is highly correlated with pid and 1 other fieldsHigh correlation
isHiggs is highly correlated with label and 2 other fieldsHigh correlation
pid is highly correlated with mom_massHigh correlation
pos_t is highly correlated with pos_rHigh correlation
pos_r is highly correlated with pos_tHigh correlation
mom_theta is highly correlated with pos_thetaHigh correlation
pos_phi is highly correlated with pos_theta and 1 other fieldsHigh correlation
pos_theta is highly correlated with mom_mass and 2 other fieldsHigh correlation
label is highly correlated with isOther and 3 other fieldsHigh correlation
mom_phi is highly correlated with pos_phiHigh correlation
mom_p is highly correlated with isHiggs and 2 other fieldsHigh correlation
isZ is highly correlated with isHiggs and 2 other fieldsHigh correlation
isHiggs is highly correlated with isZ and 1 other fieldsHigh correlation
isOther is highly correlated with labelHigh correlation
isZ is highly correlated with isHiggs and 1 other fieldsHigh correlation
label is highly correlated with isHiggs and 2 other fieldsHigh correlation
mom_p has unique values Unique
mom_theta has unique values Unique
mom_phi has unique values Unique
pos_r has 215111 (41.1%) zeros Zeros
pos_theta has 215111 (41.1%) zeros Zeros
pos_phi has 215111 (41.1%) zeros Zeros
pos_t has 215111 (41.1%) zeros Zeros
mom_mass has 261398 (49.9%) zeros Zeros

Reproduction

Analysis started2021-07-04 12:24:16.352271
Analysis finished2021-07-04 12:25:16.600422
Duration1 minute and 0.25 seconds
Software versionpandas-profiling v3.0.0
Download configurationconfig.json

Variables

pid
Real number (ℝ)

HIGH CORRELATION

Distinct20
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean13.32070282
Minimum-2212
Maximum2212
Zeros0
Zeros (%)0.0%
Negative136667
Negative (%)26.1%
Memory size4.0 MiB

Quantile statistics

Minimum-2212
5-th percentile-211
Q1-12
median22
Q3130
95-th percentile211
Maximum2212
Range4424
Interquartile range (IQR)142

Descriptive statistics

Standard deviation475.4657624
Coefficient of variation (CV)35.69374446
Kurtosis15.74687602
Mean13.32070282
Median Absolute Deviation (MAD)34
Skewness-0.08537914953
Sum6978583
Variance226067.6912
MonotonicityNot monotonic
Histogram with fixed size bins (bins=20)
ValueCountFrequency (%)
22236924
45.2%
21195863
18.3%
-21195654
18.3%
-32113840
 
2.6%
32113760
 
2.6%
13013503
 
2.6%
1211105
 
2.1%
-1211060
 
2.1%
-22125873
 
1.1%
22125798
 
1.1%
Other values (10)20510
 
3.9%
ValueCountFrequency (%)
-22125873
 
1.1%
-21125534
 
1.1%
-32113840
 
2.6%
-21195654
18.3%
-16125
 
< 0.1%
-141079
 
0.2%
-13991
 
0.2%
-1211060
 
2.1%
-112511
 
0.5%
112466
 
0.5%
ValueCountFrequency (%)
22125798
 
1.1%
21125609
 
1.1%
32113760
 
2.6%
21195863
18.3%
13013503
 
2.6%
22236924
45.2%
16125
 
< 0.1%
14980
 
0.2%
131090
 
0.2%
1211105
 
2.1%

pos_r
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct147916
Distinct (%)28.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean35.82737822
Minimum0
Maximum20675.42931
Zeros215111
Zeros (%)41.1%
Negative0
Negative (%)0.0%
Memory size4.0 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median5.131018041 × 10-5
Q30.2494305618
95-th percentile113.9591273
Maximum20675.42931
Range20675.42931
Interquartile range (IQR)0.2494305618

Descriptive statistics

Standard deviation251.6969232
Coefficient of variation (CV)7.025267707
Kurtosis572.0904676
Mean35.82737822
Median Absolute Deviation (MAD)5.131018041 × 10-5
Skewness17.96524994
Sum18769605.18
Variance63351.34117
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0215111
41.1%
3.10928346910
 
< 0.1%
0.254146939910
 
< 0.1%
1.4127460959
 
< 0.1%
0.60102379429
 
< 0.1%
0.21150998239
 
< 0.1%
0.063760617788
 
< 0.1%
0.04254695998
 
< 0.1%
1.5893266818
 
< 0.1%
0.38062018467
 
< 0.1%
Other values (147906)308701
58.9%
ValueCountFrequency (%)
0215111
41.1%
1.321541899 × 10-92
 
< 0.1%
3.288187914 × 10-92
 
< 0.1%
1.127240242 × 10-82
 
< 0.1%
1.174375335 × 10-82
 
< 0.1%
1.252954158 × 10-82
 
< 0.1%
1.388321658 × 10-82
 
< 0.1%
1.417202206 × 10-82
 
< 0.1%
1.424466981 × 10-82
 
< 0.1%
1.919384125 × 10-82
 
< 0.1%
ValueCountFrequency (%)
20675.429312
< 0.1%
13743.311452
< 0.1%
12255.973932
< 0.1%
11738.93142
< 0.1%
11052.182532
< 0.1%
10680.225212
< 0.1%
10315.762742
< 0.1%
10256.944492
< 0.1%
10256.943592
< 0.1%
10160.516612
< 0.1%

pos_theta
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct147916
Distinct (%)28.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.9272984453
Minimum0
Maximum3.130104787
Zeros215111
Zeros (%)41.1%
Negative0
Negative (%)0.0%
Memory size4.0 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0.8109428138
Q31.724584543
95-th percentile2.542850852
Maximum3.130104787
Range3.130104787
Interquartile range (IQR)1.724584543

Descriptive statistics

Standard deviation0.9324573161
Coefficient of variation (CV)1.005563334
Kurtosis-1.204525554
Mean0.9272984453
Median Absolute Deviation (MAD)0.8109428138
Skewness0.4391705094
Sum485802.3825
Variance0.8694766464
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0215111
41.1%
0.858968580510
 
< 0.1%
2.62509811610
 
< 0.1%
1.8371765939
 
< 0.1%
1.5209858489
 
< 0.1%
1.5734179319
 
< 0.1%
1.8373770818
 
< 0.1%
1.14807048
 
< 0.1%
0.62557952758
 
< 0.1%
2.4440679657
 
< 0.1%
Other values (147906)308701
58.9%
ValueCountFrequency (%)
0215111
41.1%
0.015657813152
 
< 0.1%
0.015666865883
 
< 0.1%
0.016285073162
 
< 0.1%
0.017118679252
 
< 0.1%
0.017871728692
 
< 0.1%
0.019389135312
 
< 0.1%
0.020644061451
 
< 0.1%
0.021754989772
 
< 0.1%
0.021782255762
 
< 0.1%
ValueCountFrequency (%)
3.1301047872
< 0.1%
3.1286358592
< 0.1%
3.1286358462
< 0.1%
3.1280917553
< 0.1%
3.1253939512
< 0.1%
3.1237466262
< 0.1%
3.122241342
< 0.1%
3.1219739872
< 0.1%
3.1210376872
< 0.1%
3.121021882
< 0.1%

pos_phi
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct147864
Distinct (%)28.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-0.003248919521
Minimum-3.141589072
Maximum3.141488565
Zeros215111
Zeros (%)41.1%
Negative154953
Negative (%)29.6%
Memory size4.0 MiB

Quantile statistics

Minimum-3.141589072
5-th percentile-2.609384477
Q1-0.4842947578
median0
Q30.4649110527
95-th percentile2.607050637
Maximum3.141488565
Range6.283077636
Interquartile range (IQR)0.9492058105

Descriptive statistics

Standard deviation1.392739216
Coefficient of variation (CV)-428.6776594
Kurtosis0.04902840625
Mean-0.003248919521
Median Absolute Deviation (MAD)0.4740099868
Skewness-0.0001752078931
Sum-1702.076448
Variance1.939722524
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0215111
41.1%
-1.48114815910
 
< 0.1%
0.92540152810
 
< 0.1%
-2.1783833699
 
< 0.1%
-0.70769025229
 
< 0.1%
-0.64434542889
 
< 0.1%
-1.9683305428
 
< 0.1%
-1.1671219138
 
< 0.1%
1.3398363388
 
< 0.1%
-1.5017565227
 
< 0.1%
Other values (147854)308701
58.9%
ValueCountFrequency (%)
-3.1415890722
< 0.1%
-3.1413336372
< 0.1%
-3.1413308083
< 0.1%
-3.1412181162
< 0.1%
-3.1412135692
< 0.1%
-3.1411510122
< 0.1%
-3.141135392
< 0.1%
-3.1411330512
< 0.1%
-3.1411108152
< 0.1%
-3.1409939572
< 0.1%
ValueCountFrequency (%)
3.1414885652
< 0.1%
3.1414306972
< 0.1%
3.1413619842
< 0.1%
3.1413285492
< 0.1%
3.1413163671
< 0.1%
3.141307822
< 0.1%
3.1412385662
< 0.1%
3.1411808172
< 0.1%
3.1411548422
< 0.1%
3.1411544812
< 0.1%

pos_t
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct147669
Distinct (%)28.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean36.46604689
Minimum0
Maximum20682.55744
Zeros215111
Zeros (%)41.1%
Negative0
Negative (%)0.0%
Memory size4.0 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median5.469365928 × 10-5
Q30.2592700083
95-th percentile122.7358632
Maximum20682.55744
Range20682.55744
Interquartile range (IQR)0.2592700083

Descriptive statistics

Standard deviation252.4496483
Coefficient of variation (CV)6.922868525
Kurtosis566.4122854
Mean36.46604689
Median Absolute Deviation (MAD)5.469365928 × 10-5
Skewness17.8508682
Sum19104197.31
Variance63730.8249
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0215111
41.1%
0.273536013310
 
< 0.1%
3.20806114610
 
< 0.1%
0.68839830049
 
< 0.1%
1.4750742459
 
< 0.1%
0.63791857629
 
< 0.1%
1.8177832378
 
< 0.1%
0.073539222878
 
< 0.1%
0.049026205468
 
< 0.1%
1.9345187457
 
< 0.1%
Other values (147659)308701
58.9%
ValueCountFrequency (%)
0215111
41.1%
1.999456332 × 10-92
 
< 0.1%
5.437000528 × 10-92
 
< 0.1%
1.137207877 × 10-82
 
< 0.1%
1.266879093 × 10-82
 
< 0.1%
1.536747415 × 10-82
 
< 0.1%
1.594279841 × 10-82
 
< 0.1%
1.697779313 × 10-82
 
< 0.1%
1.945356913 × 10-82
 
< 0.1%
2.007894421 × 10-82
 
< 0.1%
ValueCountFrequency (%)
20682.557442
< 0.1%
13747.791492
< 0.1%
12270.571562
< 0.1%
11741.165562
< 0.1%
11054.720392
< 0.1%
10687.005842
< 0.1%
10316.41452
< 0.1%
10258.298594
< 0.1%
10173.443922
< 0.1%
10088.395412
< 0.1%

mom_p
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
UNIQUE

Distinct523890
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.5635358
Minimum0.0004017200743
Maximum78.1807838
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.0 MiB

Quantile statistics

Minimum0.0004017200743
5-th percentile0.09219266417
Q10.4276857475
median1.194596419
Q33.442734899
95-th percentile19.70707746
Maximum78.1807838
Range78.18038208
Interquartile range (IQR)3.015049151

Descriptive statistics

Standard deviation10.79091911
Coefficient of variation (CV)2.364596135
Kurtosis20.95740229
Mean4.5635358
Median Absolute Deviation (MAD)0.9549745514
Skewness4.422035627
Sum2390790.77
Variance116.4439353
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.63795599841
 
< 0.1%
0.98024283431
 
< 0.1%
1.7417031041
 
< 0.1%
1.1345681811
 
< 0.1%
0.26658707651
 
< 0.1%
0.73350752481
 
< 0.1%
2.7822712051
 
< 0.1%
1.9783481711
 
< 0.1%
0.50318651761
 
< 0.1%
36.809558251
 
< 0.1%
Other values (523880)523880
> 99.9%
ValueCountFrequency (%)
0.00040172007431
< 0.1%
0.00063140086361
< 0.1%
0.00071129827511
< 0.1%
0.00080507474991
< 0.1%
0.00091872683281
< 0.1%
0.0011094369251
< 0.1%
0.0012604493521
< 0.1%
0.0014680489821
< 0.1%
0.0014728582671
< 0.1%
0.0014791258511
< 0.1%
ValueCountFrequency (%)
78.18078381
< 0.1%
78.180771041
< 0.1%
78.177260231
< 0.1%
78.173961141
< 0.1%
78.172850361
< 0.1%
78.171584991
< 0.1%
78.170716261
< 0.1%
78.169849711
< 0.1%
78.167390281
< 0.1%
78.16637041
< 0.1%

mom_theta
Real number (ℝ≥0)

HIGH CORRELATION
UNIQUE

Distinct523890
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.571314593
Minimum0.003447101248
Maximum3.139177111
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.0 MiB

Quantile statistics

Minimum0.003447101248
5-th percentile0.463304147
Q11.052298399
median1.571644949
Q32.088804202
95-th percentile2.68383875
Maximum3.139177111
Range3.13573001
Interquartile range (IQR)1.036505803

Descriptive statistics

Standard deviation0.6784044163
Coefficient of variation (CV)0.4317432165
Kurtosis-0.7963667357
Mean1.571314593
Median Absolute Deviation (MAD)0.5182879822
Skewness0.003163044846
Sum823196.002
Variance0.4602325521
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2.1218273611
 
< 0.1%
1.1277093561
 
< 0.1%
1.8941063881
 
< 0.1%
1.6451722391
 
< 0.1%
0.51232344691
 
< 0.1%
2.9228140511
 
< 0.1%
2.1275158951
 
< 0.1%
0.2199762071
 
< 0.1%
1.9189433611
 
< 0.1%
2.3652340061
 
< 0.1%
Other values (523880)523880
> 99.9%
ValueCountFrequency (%)
0.0034471012481
< 0.1%
0.0034694015991
< 0.1%
0.0053214505791
< 0.1%
0.0056450906621
< 0.1%
0.00636714291
< 0.1%
0.0064081094021
< 0.1%
0.0066836240211
< 0.1%
0.0072299950621
< 0.1%
0.0073512659221
< 0.1%
0.0089558104561
< 0.1%
ValueCountFrequency (%)
3.1391771111
< 0.1%
3.138652751
< 0.1%
3.1375105031
< 0.1%
3.1372468881
< 0.1%
3.1368474711
< 0.1%
3.1352666291
< 0.1%
3.1350979511
< 0.1%
3.133842061
< 0.1%
3.1334760241
< 0.1%
3.1325167981
< 0.1%

mom_phi
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
UNIQUE

Distinct523890
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-0.001858482681
Minimum-3.141583493
Maximum3.141590759
Zeros0
Zeros (%)0.0%
Negative262524
Negative (%)50.1%
Memory size4.0 MiB

Quantile statistics

Minimum-3.141583493
5-th percentile-2.823393021
Q1-1.574115916
median-0.006785427
Q31.574189431
95-th percentile2.823762666
Maximum3.141590759
Range6.283174253
Interquartile range (IQR)3.148305347

Descriptive statistics

Standard deviation1.813322318
Coefficient of variation (CV)-975.7004123
Kurtosis-1.203226292
Mean-0.001858482681
Median Absolute Deviation (MAD)1.573982566
Skewness0.003482621188
Sum-973.6404918
Variance3.28813783
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-0.39643673271
 
< 0.1%
2.2978447091
 
< 0.1%
1.5726729581
 
< 0.1%
-3.0590624671
 
< 0.1%
-2.1030247961
 
< 0.1%
-2.6784704121
 
< 0.1%
1.6822098521
 
< 0.1%
-2.0608009281
 
< 0.1%
-1.1940830621
 
< 0.1%
0.92003040691
 
< 0.1%
Other values (523880)523880
> 99.9%
ValueCountFrequency (%)
-3.1415834931
< 0.1%
-3.1415806341
< 0.1%
-3.14156381
< 0.1%
-3.1415599461
< 0.1%
-3.1415448471
< 0.1%
-3.1415424781
< 0.1%
-3.1415292851
< 0.1%
-3.1415263931
< 0.1%
-3.1415092631
< 0.1%
-3.141501561
< 0.1%
ValueCountFrequency (%)
3.1415907591
< 0.1%
3.1415551111
< 0.1%
3.1415470091
< 0.1%
3.1415397491
< 0.1%
3.1415370171
< 0.1%
3.1415356791
< 0.1%
3.1415089331
< 0.1%
3.1415034161
< 0.1%
3.141502361
< 0.1%
3.1414862971
< 0.1%

mom_mass
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct8
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.1311676541
Minimum0
Maximum0.9395700097
Zeros261398
Zeros (%)49.9%
Negative0
Negative (%)0.0%
Memory size4.0 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0.0005109999911
Q30.1395699978
95-th percentile0.4976100028
Maximum0.9395700097
Range0.9395700097
Interquartile range (IQR)0.1395699978

Descriptive statistics

Standard deviation0.21810892
Coefficient of variation (CV)1.662825499
Kurtosis5.865233072
Mean0.1311676541
Median Absolute Deviation (MAD)0.0005109999911
Skewness2.465418255
Sum68717.42232
Variance0.04757150097
MonotonicityNot monotonic
Histogram with fixed size bins (bins=8)
ValueCountFrequency (%)
0261398
49.9%
0.1395699978191517
36.6%
0.493680000327600
 
5.3%
0.497610002813503
 
2.6%
0.938269972811671
 
2.2%
0.939570009711143
 
2.1%
0.00051099999114977
 
1.0%
0.1056599992081
 
0.4%
ValueCountFrequency (%)
0261398
49.9%
0.00051099999114977
 
1.0%
0.1056599992081
 
0.4%
0.1395699978191517
36.6%
0.493680000327600
 
5.3%
0.497610002813503
 
2.6%
0.938269972811671
 
2.2%
0.939570009711143
 
2.1%
ValueCountFrequency (%)
0.939570009711143
 
2.1%
0.938269972811671
 
2.2%
0.497610002813503
 
2.6%
0.493680000327600
 
5.3%
0.1395699978191517
36.6%
0.1056599992081
 
0.4%
0.00051099999114977
 
1.0%
0261398
49.9%

isHiggs
Boolean

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size511.7 KiB
True
503735 
False
 
20155
ValueCountFrequency (%)
True503735
96.2%
False20155
 
3.8%

isZ
Boolean

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size511.7 KiB
False
503890 
True
 
20000
ValueCountFrequency (%)
False503890
96.2%
True20000
 
3.8%

isOther
Boolean

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size511.7 KiB
False
523735 
True
 
155
ValueCountFrequency (%)
False523735
> 99.9%
True155
 
< 0.1%

label
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size29.0 MiB
0
503735 
1
 
20000
2
 
155

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters523890
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0503735
96.2%
120000
 
3.8%
2155
 
< 0.1%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
0503735
96.2%
120000
 
3.8%
2155
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
0503735
96.2%
120000
 
3.8%
2155
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number523890
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0503735
96.2%
120000
 
3.8%
2155
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Common523890
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0503735
96.2%
120000
 
3.8%
2155
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII523890
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0503735
96.2%
120000
 
3.8%
2155
 
< 0.1%

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

pidpos_rpos_thetapos_phipos_tmom_pmom_thetamom_phimom_massisHiggsisZisOtherlabel
0120.00.00.00.057.5175872.785539-2.5788920.00000FalseTrueFalse1
1-120.00.00.00.047.2537700.986708-0.9638750.00000FalseTrueFalse1
221120.00.00.00.03.3784902.0855581.4053200.93957TrueFalseFalse0
3-22120.00.00.00.02.8999871.9833361.3153930.93827TrueFalseFalse0
43210.00.00.00.03.2561141.7774171.4111500.49368TrueFalseFalse0
5-2110.00.00.00.02.2571510.8439471.4671990.13957TrueFalseFalse0
6-2110.00.00.00.00.5899050.9088712.4634600.13957TrueFalseFalse0
7-2110.00.00.00.02.0882600.700026-2.4047150.13957TrueFalseFalse0
8-2110.00.00.00.01.1364610.589039-1.7538180.13957TrueFalseFalse0
92110.00.00.00.00.8008870.890430-1.4116190.13957TrueFalseFalse0

Last rows

pidpos_rpos_thetapos_phipos_tmom_pmom_thetamom_phimom_massisHiggsisZisOtherlabel
523880220.0002660.8039730.3894450.0002660.2443160.9429800.3561600.00000TrueFalseFalse0
523881225.1742630.7992600.3882135.1828750.3156741.1418220.5274190.00000TrueFalseFalse0
5238823215.1742630.7992600.3882135.1828757.4093590.7892520.3525660.49368TrueFalseFalse0
523883-2115.1742630.7992600.3882135.1828758.7025330.8111390.4319110.13957TrueFalseFalse0
5238842247.6445781.553182-1.71461955.2144410.0644170.9502411.7436210.00000TrueFalseFalse0
5238852247.6445781.553182-1.71461955.2144410.0926891.407499-2.0856750.00000TrueFalseFalse0
5238862247.6446111.553182-1.71461955.2144450.6214261.619202-1.7476280.00000TrueFalseFalse0
5238872247.6446111.553182-1.71461955.2144450.2061081.607157-1.3680380.00000TrueFalseFalse0
5238882115.1742630.7992600.3882135.18287513.3009550.7835740.3731550.13957TrueFalseFalse0
523889-2115.1742630.7992600.3882135.1828752.6121540.8308880.3951180.13957TrueFalseFalse0